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Abstract. Protein folding, peptide aggregation and crystallization, as well as adsorption of 
molecules on soft or solid substrates have an essential feature in common: In all these processes, 
structure formation is guided by a collective, cooperative behavior of the molecular subunits lin- 
ing up to build chainlike macromolecules. Proteins experience conformational transitions related to 
thermodynamic phase transitions. For chains of finite length, an important difference of crossovers 
between conformational (pseudo)phases is, however, that these transitions are typically rather 
smooth processes, i.e., thermodynamic activity is not necessarily signalized by strong entropic or 
energetic fluctuations. Nonetheless, in order to understand generic properties of molecular structure- 
formation processes, the analysis of mesoscopic models from a statistical physics point of view en- 
ables first insights into the nature of conformational transitions in small systems. Here, we review 
recent results obtained by means of sophisticated generalized-ensemble computer simulations of 
minimalistic coarse-grained models. 
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INTRODUCTION 

At the atomic level, proteins have a complex chemical structure which is formed and sta- 
bilized by the electronic properties of the atoms. Thus, the precise analysis of structure 
and dynamical behavior of molecules require a detailed knowledge of the quantum me- 
chanics involved. For molecular systems of interest, where even the smallest molecules 
contain hundreds to thousands of atoms, a quantum-mechanical analysis is usually sim- 
ply impossible, a fortiori if effects of the environment (e.g., an aqueous solution) are 
non-negligible. This problem reflects the dilemma of "realistic" all-atom models which 
are based on semiclassical assumptions and, consequently, depend on hundreds of pa- 
rameters mimicking quantum-mechanical effects. Actually, taken with sufficient care, 
the application of such models is often inevitable if specific questions on atomic scales 
shall be investigated. 

However, from a physics point of view, one may ask: Is it really necessary at all to 
employ all-atom models, if one is interested in generic features of molecular mechanics 
which typically anyway requires a cooperative action of larger subunits (monomers)? 
The answer would be "no", if conformational transitions accompanying molecular 
structure-formation processes indeed exhibit similarities to thermodynamic phase tran- 
sitions, in which case it should be possible to reveal general, qualitative properties by 
analyses of suitably simplified models on mesoscopic scales [111, 120 . 

After a few evolutionary remarks and introducing typical mesoscopic models, we 



eventually present results from studies of protein folding and peptide aggregation pro- 
cesses which lead to the conclusion that characteristic features of the identified confor- 
mational transitions are also relevant in corresponding natural structuring processes. 

THE EVOLUTIONARY ASPECT 

The number of different functional proteins encoded in the human DNA is of order 
100000 - an extremely small number compared to the total number of possibilities: 
Recalling that 20 amino acids line up natural proteins and typical proteins consist of 
N ~ ff{\0^ — 10^) amino acid residues, the number of possible primary structures 20^ 
lies somewhere far, far above 20^*^*^ ~ lO^'^*^. Assuming all proteins were of size N =\QQ 
and a single folding event would take 1 ms, a sequential enumeration process would need 
about 10^^^ years to generate structures of all sequences, irrespective of the decision 
about their "fitness", i.e., the functionality and ability to efficiently cooperate with other 
proteins in a biological system. Of course, one might argue that the evolution is a highly 
parallelized process which drastically increases the generation rate. So, we can ask the 
question, how many processes can maximally run in parallel. 

The universe contains of the order of 10^^ protons. Assuming further that an average 
amino acid consists of at least 50 protons, a chain with = 100 amino acids has of 
the order protons, i.e., 10^^ sequences could be generated in each millisecond 

(forgetting for the moment that some proton-containing machinery is necessary for the 
generation process and only a small fraction of protons is assembled in earth-bound 
organic matter). The age of our universe is about lO^'^ years (we also forget that the 
Earth is even about one order of magnitude younger) or 10^^ ms. Hence, about 10^^ 
sequences could have been tested to date, if our drastic simplifications were right. But 
even this yet much too optimistic estimate is still noticeably smaller than the above 
mentioned reference number of 10^^*^ possible sequences for a 100-mer. 

At least two conclusions can be drawn from this crude analysis. One thing is that the 
evolutionary process of generating and selecting sequences is ongoing as it is likely that 
only a small fraction of functional proteins has been identified yet by nature. On the 
other hand, the existence of complex biological systems, where hundreds of thousands 
different types of macromolecules interact efficiently, can only be explained by means 
of efficient evolutionary strategies of adaptation to environmental conditions on Earth 
which dramatically changed through billions of years. Furthermore, the development 
from primitive to complex biological systems leads to the conclusion that within the 
evolutionary process of protein design, particular patterns in the genetic code have sur- 
vived over generations, while others were improved (or deselected) by recombinations, 
selections, and mutations. But the sequence question is only one side. Another regards 
the geometric structures of proteins which are directly connected to biological func- 
tionalities. The conformational similarity among human functional proteins is also quite 
surprising; only of the order of 1 000 significantly different "folds" were identified [ij]. 

Since the conformation space is infinitely large because of the continuous degrees of 
freedom and the sequence space is also giant, the protein folding problem is typically 
attacked from two sides: the direct folding problem, where the amino acid sequence is 
given and the associated native, functional conformation has to be identified, and the 
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FIGURE 1. Coarse-graining peptides in a "united atom" approach. Each amino acid is contracted 
to a single "C"" interaction point. The effective distance between adjacent, bonded interaction sites is 
about 3.8 A. In the coarse-grained hydrophobic-polar models considered here, the interaction sites have 
no steric extension. The excluded volume is modeled via type-specific Lennard-Jones pair potentials. In 
hydrophobic-polar (HP) peptide models, only hydrophobic (H) and polar (P) amino acid residues are 
distinguished. 



inverse folding problem, where one is interested in all sequences that fold into a given 
target conformation. With these two approaches, it is, however, virtually impossible to 
unravel evolutionary factors that led to the set of present functional proteins. Only for 
very simple protein models, a comprising statistical analysis of sequence and conforma- 
tion spaces is possible. 

SIMPLE APPROACHES TO COARSE-GRAINED MODELING OF 

PROTEINS 

Coarse-graining of models, where relevant length scales are increased by reducing the 
number of microscopic degrees of freedom, has proven to be very successful in polymer 
science and protein folding [2]. Although specificity is much more sensitive for pro- 
teins, since details (charges, polarity, etc.) and differences of the amino acid side chains 
can have strong influences on the fold, also here mesoscopic approaches are of essen- 
tial importance for the basic understanding of conformational transitions affecting the 
folding process. It is also the only possible approach for systematic analyses of basic 
problems such as the evolutionarily significant question why only a few sequences in 
nature are "designing" and thus relevant for selective functions. On the other hand, what 
is the reason why proteins prefer a comparative small set of target structures, i.e., what 
explains the preference of designing sequences to fold into the same three-dimensional 
structure? Many of these questions are still widely unanswered yet. Actually, the com- 
plexity of these questions requires a huge number of comparative studies of complete 
classes of peptide sequences and structures that cannot be achieved by means of com- 



puter simulations of microscopic models. Currently only two approaches are promising. 
One is the bioinformatics approach of designing and scoring sequences and structures 
(and also possible combinations of receptors and ligands in aggregates), often based 
on data base scanning according to certain criteria. Another, more physically motivated 
approach makes use of coarse-grained models, where only a few specific properties of 
the monomers enter into the models. Since a characteristic feature of non-membrane 
proteins is to possess a compact hydrophobic core, screened from the surrounding sol- 
vent by a shell of polar monomers, frequently only two types of amino acids are dis- 
tinguished: hydrophobic (H) and polar (P) residues, giving the class of corresponding 
models the name "hydrophobic-polar" (HP) models (see Fig.[il). 



In the simplest case, the HPpeptide chain is a linear, self-avoiding chain of H and 
P residues on a regular lattice [l3|,|j]. Such models allow a comprising analysis of both, 
the conformation and sequence space, e.g., by exactly enumerating all combinatorial 
possibilities [5]. Other important aspects in lattice model studies are the identification of 
lowest-energy conformations of comparatively long sequences and the characterization 
of the folding thermodynamics [6]. 

In the HP model, a monomer of an HP sequence a = (c7i , a2, . . . , o^) is characterized 
by its residual type (a,- = P for polar and C7, = H for hydrophobic residues), the position 
I <i <N within the chain of length A^, and the spatial position x,- to be measured in units 
of the lattice spacing. A conformation is then symbolized by the vector of the coordinates 
of successive monomers, X = (xi,X2, . . .^x^). The distance between the ith and the jth 
monomer is denoted by = |x,- — xy|. The bond length between adjacent monomers 
in the chain is identical with the spacing of the used regular lattice with coordination 
number q. These covalent bonds are thus not stretchable. A monomer and its nonbonded 
nearest neighbors may form so-called contacts. Therefore, the maximum number of 
contacts of a monomer within the chain is (q — 2) and — 1) for the monomers at 
the ends of the chain. To account for the excluded volume, lattice proteins are self- 
avoiding, i.e., two monomers cannot occupy the same lattice site. The total energy for 
an HP protein reads in energy units £q (we set Cq = 1 in the following) 



The HP model for lattice proteins 



Enp = £o ^ CijU( 
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(1) 



where Cij = (1 — di+ij)A{xij — 1) with 




z = 0, 



(2) 



is a symmetric N xN matrix called contact map and 




(3) 



is the 2x2 interaction matrix. Its elements Uoiaj correspond to the energy of HH, HP, 
and PP contacts. For labeling purposes we shall adopt the convention that Oi = Q = P 
and a, = 1=//. 

In the simplest formulation fl], only the attractive hydrophobic interaction is nonzero, 
m}}^ = —l.i^p = Mpp = 0. Therefore, U^^. = —daiH^ajH- This parametrization has 
been extensively used to identify ground states of HP sequences, some of which are 
believed to show up qualitative properties comparable with realistic proteins whose 20- 
letter sequence was transcribed into the 2-letter code of the HP model ll7l[8l.l9l. [l0l [llll. 

This simple form of the standard HP model suffers, however, from the fact that the 
lowest-energy states are usually highly degenerate and therefore the number of designing 
sequences (i.e., sequences with unique ground state) is very small. Incorporating addi- 
tional inter-residue interactions [3], symmetries are broken, degeneracies are smaller, 
and the number of designing sequences increases |5] . 



A simple off-lattice generalization: The AB model 

Since lattice models suffer from undesired effects of the underlying lattice symme- 
tries, simple hydrophobic-polar off-lattice models were defined. One such model is the 
AB model, where, for historical reasons, A symbolizes hydrophobic and B polar regions 
of the protein, whose conformations are modeled by polymer chains in continuum space 
governed by effective bending energy and van der Waals interactions [v^]. These models 
allow for the analysis of different mutated sequences with respect to their folding char- 
acteristics. Here, the idea is that the folding transition is a kind of pseudophase transition 
which can in principle be described by one or a few order-like parameters. Depending 
on the sequence, the folding process can be highly cooperative (downhill folding), less 
cooperative depending on the height of a free-energy barrier (two- state folding), or even 
frustrating due to the existence of different barriers in a metastable regime (crystal or 
glassy phases) [|l3lll4l] . These characteristics known from functional proteins can be re- 
covered in the AB model, which is computationally much less demanding than all-atom 
formulations and thus enables throughout theoretical analyses. 

We denote the spatial position of the zth monomer in a heteropolymer consisting of 
N residues by xt, i = 1, . . . ,A^, and the vector connecting nonadjacent monomers i and j 
by Tij = Xi — Xj. For covalent bond vectors, we set |b, | = |r,;,_|_i | = 1 . The bending angle 
between monomers k, k+l, and k + 2 is d-k {0 < -d-k < Tl) and a,; = A,B symbolizes the 
type of the monomer. In the AB model [.1Z1 , the energy of a conformation is given by 

1 N-2 N / 1 r(o a )\ 

£AB = iE(i-™s*)+4E E 4-^^ . (4) 

where the first term is the bending energy and the sum runs over the (A^ — 2) bending 
angles of successive bond vectors. The second term partially competes with the bend- 
ing barrier by a potential of Lennard- Jones type. It depends on the distance between 
monomers being nonadjacent along the chain and accounts for the influence of the AB 
sequence on the energy. The long-range behavior is attractive for pairs of like monomers 



and repulsive for AB pairs of monomers: 




C(a„a;)= <; +1/2, Oi,Oj=B, (5) 

The AB model is a C" type model in that each residue is represented by a single 
interaction site only, the "C" atom" (see Fig. [T]). Thus, the natural dihedral torsional 
degrees of freedom of realistic protein backbones are replaced by virtual bond and 
torsion angles. The large torsional barrier of the peptide bond between neighboring 
amino acids is in the AB model effectively taken into account by introducing the bending 
energy. 

Although this coarse-grained picture will obviously not be sufficient to reproduce 
microscopic properties of specific realistic proteins, it qualitatively exhibits, however, 
sequence-dependent features known from nature, as, for example, tertiary folding path- 
ways characteristic for two-state folding, folding through intermediates, or metastabil- 



ity UM, and two-state kinetics llSfl. 



THERMODYNAMICS OF HETEROPOLYMER FOLDING 

For the analysis of conformational transitions accompanying the tertiary folding behav- 
ior of lattice proteins, multicanonical chain-growth simulations |0, 0] can be efficiently 
performed for the HP model. An example is the 42-mer with the sequence PH2PHPH2P- 
HPHP2H3PHPH2PHPH3P2HPHPH2PHPH2P that forms a parallel helix in the ground 
state. Originally, it was designed to serve as a lattice model of the parallel /3 helix of 
pectate lyase C [|T^ . But there are additional properties that make it an interesting and 
challenging system. The ground-state energy is known to be E^^^ = —34. In the sim- 
ulations, the ground-state degeneracy was estimated to be ^0 = 3.9 ±0.4 ['2', '5'], which 
is in perfect agreement with the known value = 4 (except translational, rotational, 
and reflection symmetries) [17]. As we will see, there are two conformational transi- 
tions. At low temperatures, fluctuations of energetic and structural quantities signalize 
a (pseudo)transition between the lowest-energy states possessing compact hydrophobic 
cores and the regime of globular conformations, and at a higher temperature, there is 
another transition between globules and random coils. 

The average structural properties at finite temperatures can be characterized best by 
the mean end-to-end distance (Ree) (T) and the mean radius of gyration (^gyr) (T) . Mul- 
ticanonical chain-growth simulation results for {Rq^)(T) and (7?gyr)(r) of the 42-mer 
are shown in Fig. [2l The pronounced minimum in the end-to-end distance can be inter- 
preted as an indication of the transition between the lowest-energy states and globules: 
The small number of ground states have similar and highly symmetric shapes (due to the 
reflection symmetry of the sequence) but the ends of the chain are polar and therefore 
they are not required to reside close to each other. Increasing the temperature allows the 
protein to fold into conformations different from the ground states and contacts between 
the ends become more likely. Therefore, the mean end-to-end distance decreases and the 
protein has entered the globular "phase". Further increasing the temperature leads then 




FIGURE 2. Mean end-to-end distance (7?ee) and mean radius of gyration (Rsyr) of the 42-mer. 



to a disentanglement of globular structures and random coil conformations with larger 
end-to-end distances dominate. In Fig. [3l we have plotted the specific heat Cv{T) and 
the derivatives of the mean end-to-end distance and of the mean radius of gyration with 
respect to the temperature, d{Ree)/dT and d{Rgyr)/dT. 

Two temperature regions of conformational activity (shaded in gray), where the curves 
of the fluctuating quantities exhibit extremal points, can clearly be separated. We es- 
timate the temperature region of the ground-state - globule transition to be within 

Tq^^ 0.24 and T^^^ ^ 0.28. The globule - random coil transition takes place between 

t}^^ ^ 0.53 and ^ ^ 0.70. 

For high temperatures, random conformations are favored. In consequence, in the cor- 
responding, rather entropy-dominated ensemble, the high-degenerate high-energy struc- 
tures govern the thermodynamic behavior of the macrostates. A typical representative 
is shown as an inset in the high-temperature pseudophase in Fig. [31 Annealing the sys- 
tem (or, equivalently, decreasing the solvent quality), the heteropolymer experiences a 
conformational transition towards globular macrostates. A characteristic feature of these 
intermediary "molten" globules is the compactness of the dominating conformations as 
expressed by a small gyration radius. Nonetheless, the conformations do not exhibit a 
noticeable internal long-range symmetry and behave rather like a fluid. Local confor- 
mational changes are not hindered by strong free-energy barriers. The situation changes 
by entering the low-temperature (or poor-solvent) conformational phase. In this region, 
energy dominates over entropy and the effectively attractive hydrophobic force favors 
the formation of a maximally compact core of hydrophobic monomers. Polar residues 
are expelled to the surface of the globule and form a shell that screens the core from the 
(fictitious) aqueous environment. 

The existence of the hydrophobic-core collapse renders the folding behavior of a 
heteropolymer different from crystallization or amorphous transitions of homopoly- 
mers 120]. The reason is the disorder induced by the sequence of different monomer 
types. The hydrophobic-core formation is the main cooperative conformational transi- 
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FIGURE 3. Specific heat Cv and derivatives w.r.t. temperature of mean end-to-end distance (/?ee) and 
radius of gyration {Rgyi) as functions of temperature for the 42-mer. The ground-state - globule transition 

occurs between Tq^' w 0.24 and 7jj^^ « 0.28, while the globule - random coil transition takes place 

between Tj''' w 0.53 and Tj'^' « 0.70 (shaded areas). 

tion which accompanies the tertiary folding process of a single-domain protein. 

In Fig. m we have plotted the canonical distributions p'^^'^ (E) for different temper- 
atures in the vicinity of the two transitions. From Fig.H^a) we read off that the distri- 
butions possess two peaks at temperatures within that region where the ground-state - 
globule transition takes place. This is interpreted as indication of a "first-order-like" 
transition, i.e., both types of macrostates coexist in this temperature region ll2lil . The be- 
havior in the vicinity of the globule - random coil transition is less spectacular as can be 
seen in Fig.Sfb), and since the energy distribution shows up one peak only, this transi- 
tion could be denoted as being "second-order- like". The width of the distributions grows 
with increasing temperature until it has reached its maximum value which is located near 
T ^ 0.7. For higher temperatures, the distributions become narrower again [61]. 

PROTEIN FOLDING IS A FINITE-SIZE EFFECT 

Understanding protein folding by means of equilibrium statistical mechanics and ther- 
modynamics is a difficult task. A single folding event of a protein cannot occur "in 
equilibrium" with its environment. But protein folding is often considered as a fold- 
ing/unfolding process with folding and unfolding rates which are balanced in a station- 
ary state that defines the "chemical equilibrium". Thus, the statistical properties of an in- 
finitely long series of folding/unfolding cycles under constant external conditions (which 
are mediated by the surrounding solvent) can then also be understood - at least in parts - 
from a thermodynamical point of view. In particular, folding and unfolding of a protein 
are conformational transitions and one is tempted to simply take over the conceptual 
philosophy behind thermodynamic phase transitions, in particular known from "freez- 





FIGURE 4. Canonical distributions for the 42-mer at temperatures (a) T = 0.24,0.25, ... ,0.30 
close to the ground-state - globule transition region between r,|'' « 0.24 and r^^' « 0.28, (b) T = 
0.50,0.55, . . . , 1.0. The high-temperature peak of the specific heat in Fig. [3] is near Tj''' « 0.53, but at 

Jj'^' w 0.73 the distribution has the largest width |@]. Near this temperature, the mean radius of gyration 
and the mean end-to-end distance (see Figs. |2] and O have their biggest slope. 



ing/melting" and "condensation/evaporation" transitions of gases. But, such an approach 
has to be taken with great care. Thermodynamic phase transitions occur only in the ther- 
modynamic limit, i.e., in infinitely large systems. A protein is, however, a heteropolymer 
uniquely defined by ii^ finite amino acid sequence, which is actually comparatively short 
and cannot be made longer without changing its specific properties. This is different for 
polymerized molecules ("homopolymers"), where the infinite-length chain limit can be 
defined, in principle. The intensely studied collapse or transition between the random- 
coil and the globular phase is such a phase transition in the truest sense [18], where a 
finite-size scaling towards the infinitely long chain is feasible [i^,|23]. In this case, also 
a classification of the phase transitions into continuous transitions (where the latent heat 



vanishes and fluctuations exhibit power-law behavior close to the critical point) and dis- 
continuous transitions (with nonvanishing latent heat) is possible. 

For proteins (or heteropolymers with a "disordered" sequence), a finite-size scaling 
is impossible and so a classification of conformational transitions in a strict sense. 
Nonetheless, cooperative conformational changes are often referred to as "folding", 
"hydrophobic-coUapse", "hydrophobic-core formation", or "glassy" transitions. All 
these transitions are defined on the basis of certain parameters, also called "order 
parameters" or "reaction coordinates", but should not be confused with thermodynamic 
phase transitions. The onset of finite-system transitions is also less spectacular: Their 
identification on the basis of peaks and "shoulders" in fluctuations of energetic and 
structural quantities and interpretation in terms of "order parameters" is a rather in- 
tricate procedure. Since the different fluctuations do not "collapse" for finite systems, 
a unique transition temperature can often not be defined. Despite a surprisingly high 
cooperativity, collective changes of protein conformations are not happening in a single 
step. As we have seen in Fig. [3l transition regions separate the "pseudophases", where 
random coils, maximally compact globules, or states with compact hydrophobic core 
dominate [0,0, El]. 

Although the lattice models are very useful in unraveling generic folding characteris- 
tics, they suffer from lattice artifacts, which are, however, less relevant for long chains. 
In order to obtain a more precise and thus finer resolved image of folding characteristics, 
it is necessary to "get rid of the lattice" and to allow the coarse-grained protein to fold 
into the three-dimensional continuum. 



TERTIARY PROTEIN FOLDING CHANNELS FROM 
MESOSCOPIC MODELING 

Folding of linear chains of amino acids, i.e., bioproteins and synthetic peptides, is, for 
single-domain macromolecules, accompanied by the formation of secondary structures 
(helices, sheets, turns) and the tertiary hydrophobic-core collapse. While secondary 
structures are typically localized and thus limited to segments of the peptide, the ef- 
fective hydrophobic interaction between nonbonded, nonpolar amino acid side chains 
results in a global, cooperative arrangement favoring folds with compact hydrophobic 
core and a surrounding polar shell that screens the core from the polar solvent. Sys- 
tematic analyses for unraveling general folding principles are extremely difficult in mi- 
croscopic all-atom approaches, since the folding process is strongly dependent on the 
"disordered" sequence of amino acids and the native-fold formation is inevitably con- 
nected with, at least, significant parts of the sequence. Moreover, for most proteins, the 
folding process is relatively slow (microseconds to seconds), which is due to a complex, 
rugged shape of the free-energy landscape 122, 23i 24] with "hidden" barriers, depend- 



ing on sequence properties. Although there is no obvious system parameter that allows 
for a general description of the accompanying conformational transitions in folding pro- 
cesses (as, for example, the reaction coordinate in chemical reactions), it is known that 
there are only a few classes of characteristic folding behaviors, mainly downhill folding, 
two-state folding, folding through intermediates, and glass-like folding into metastable 
conformations SSQSSHIllj]. 



TABLE 1. Sequences of the het- 
eropolymers compared with respect 
to their folding bahvior. 



Label 


Sequence 


SI 




S2 


A4BA2BABA2B2A3BA2 


S3 


A4B2A4BA2BA3B2A 



Thus, if a classification of folding characteristics is useful at all, strongly simplified 
models should reveal statistical [14] and kinetic (\3] pseudouniversal properties. The 
reason why it appears useful to use a simplified, mesoscopic model like the AB model 
is two-fold: Firstly, it is believed that tertiary folding is mainly based on effective 
hydrophobic interactions such that atomic details play a minor role. Secondly, systematic 
comparative folding studies for mutated or permuted sequences are computationally 
extremely demanding at the atomic level and are to date virtually impossible for realistic 
proteins. We will show in the following that by employing the AB heteropolymer 
model (HI) and monitoring a suitable simple angular similarity parameter it is indeed 
possible to identify different complex folding characteristics. The similarity parameter 



is defined as follows [|13l] 



e(X,XO = l-J(X,X'). (6) 

With Nh = N — 2 and Nt = N — 3 being the respective numbers of bond angles 0, 
and torsional angles 4>,-, the angular deviation between the conformations is calculated 
according to 



J(X,X') 

where 



1 



n{Nb + Nt) 



£4(0„0;)+min £<(4>„4>;.) 



1 \ ■ r 

= 1 \i=l 



(7) 



4(0„0;) = |0,-0;-i, 

dH^iM) = min(|4>,±4>;|,2;r-|4>,±4>;|). (8) 

Here we have taken into account that the AB model is invariant under the reflection 
symmetry 4>, —^i- Thus, it is not useful to distinguish between reflection- symmetric 
conformations and therefore only the larger overlap is considered. Since — :;r < < ;r 
and < 0, < Tt, the overlap is unity, if all angles of the conformations X and X' 
coincide, else < 2 < 1. It should be noted that the average overlap of a random 
conformation with the corresponding reference state is for the sequences considered 
close to (2) ^ 0.66. As a rule of thumb, it can be concluded that values Q < 0.8 indicate 
weak or no significant similarity of a given structure with the reference conformation. 

For the qualitative discussion of the folding behavior it is useful to consider the 
histogram of energy E and angular overlap parameter Q, obtained from multicanonical 
simulations. 



where the sum runs over all Monte Carlo sweeps t. In Figs. [5ta)-[5tc), the multicanonical 
histograms //muca(£^7 2) are plotted for the three sequences listed in Table [T] Ideally, 
multicanonical sampling yields a constant energy distribution 

1 

^muca 

{E) = J dQHmuc^{E, Q) = const. (10) 


In consequence, the distribution //muca(£^7 2) can suitably be used to identify 
the folding channels, independently of temperature. This is more difficult with 
temperature-dependent canonical distributions P^^'^{E,Q), which can, of course, 
be obtained from //muca(£^, 2) by a simple reweighting procedure, P'^™(£',2) ~ 
HrwiCSi{E,Q)g{E)Qx^{—E/kBT). Nonetheless, it should be noted that, since there is a 
unique one-to-one correspondence between the average energy {E) and temperature T , 
regions of changes in the monotonic behavior of //muca(£^, 2) can also be assigned a 
temperature, where a conformational transition occurs. 

Interpreting the ridges of the probability distributions in the left-hand panel of Fig. [5] 
as folding channels, it can clearly be seen that the heteropolymers exhibit noticeable 
differences in the folding behavior towards the native conformations (N). Considering 
natural proteins it would not be surprising that different sequences of amino acids cause 
in many cases not only different native folds but also vary in their folding behavior. Here 
we are considering, however, a highly minimalistic heteropolymer model and hitherto 
it is not obvious that it is indeed possible to separate characteristic folding channels 
in this simple model, but as Fig. [5] demonstrates, in fact, it is. For sequence SI, we 
identify in Fig.^a) a typical two-state characteristics. Approaching from high energies 
(or high temperatures), the conformations in the ensemble of denatured conformations 
(D) have an angular overlap Q ^ 0.7, which means that there is no significant similarity 
with the reference structure, i.e., the ensemble D consists mainly of unfolded peptides. 
For energies E < —30 a second branch opens. This channel (N) leads to the native 
conformation (for which 2=1 and Ej^m ~ —33.8). The constant-energy distribution, 
where the main and native-fold channels D and N coexist, exhibits two peaks noticeably 
separated by a well. Therefore, the conformational transition between the channels looks 
first-order-like, which is typical for two-state folding. The main channel D contains the 
ensemble of unfolded conformations, whereas the native-fold channel N represents the 
folded states. 

The two- state behavior is confirmed by analyzing the temperature dependence of 
the minima in the free-energy landscape. The free energy as a function of the "order" 
parameter Q at fixed temperature can be suitably defined as: 

F{Q) = -kBT\np{Q). (11) 

In this expression, 

p{Q') = j ^X5(2'-2(X,x(0)))e-^W/^«^ (12) 

is related to the probability of finding a conformation with a given value of Q in the 
canonical ensemble at temperature T. The formal integration runs over all possible con- 
formations X. In the right-hand panel of Fig. [5t a), the free-energy landscape at various 
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FIGURE 5. Multicanonical histograms //muca(£^,2) of energy E and angular overlap parameter Q 
and free-energy landscapes F{Q) at different temperatures for the three sequences (see Table [T]) (a) SI, 
(b) S2, and (c) S3. The reference folds reside ?A Q — \ and E = iSmin- Pseudophases are symbolized 
by D (denatured states), N (native folds), I (intermediates), and M (metastable states). Representative 
conformations in intermediate and folded phases are also shown lfl4ll . 



temperatures is shown for sequence SI. At comparatively high temperatures {T = 0.4), 
only the unfolded states (Q ^ 0.71) in the main folding channel D dominate. Decreasing 
the temperature, the second (native-fold) channel N begins to form (Q ^ 0.9), but the 
global free-energy minimum is still associated with the main channel. Near T ^0.1, 
both free-energy minima have approximately the same value, the folding transition oc- 
curs. The discontinuous character of this conformational transition is manifest by the 
existence of the free-energy barrier between the two macrostates. For even smaller tem- 
peratures, the native-fold-like conformations {Q > 0.95) dominate and fold smoothly 
towards the 2=1 reference conformation, which is the lowest-energy conformation 
found in the simulation. 

A significantly different folding behavior is noticed for the heteropolymer with se- 
quence S2. The corresponding multicanonical histogram is shown in Fig.[5tb) and rep- 
resents a folding event through an intermediate macrostate. The main channel D bifur- 
cates and a side channel I branches off continuously. This branching is followed by the 
formation of a third channel N, which ends in the native fold. The characteristics of 
folding-through-intermediates is also confirmed by the free-energy landscapes as shown 
for this sequence in Fig. Ob) at different temperatures. Approaching from high temper- 
atures, the ensemble of denatured conformations D (Q ^ 0.76) is dominant. Close to 
the transition temperature T ^ 0.05, the intermediary phase I is reached. The overlap 
of these intermediary conformations with the native fold is about Q ~ 0.9. Decreas- 
ing the temperature further below the native-folding threshold close to T = 0.01, the 
hydrophobic-core formation is finished and stable native-fold-like conformations with 
2 > 0.97 dominate (N). 

The most extreme behavior of the three exemplified sequences is exhibited by the 
heteropolymer S3. Figure Oc) shows that the main channel D does not decay in favor of 
a native-fold channel. In fact, we observe both, the formation of two separate native- 
fold channels Mi and M2. Channel Mi advances towards the 2=1 fold and M2 
ends up in a completely different conformation with approximately the same energy 
(E ^ —33.5). The spatial structures of these two conformations are noticeably different 
and their mutual overlap is correspondingly very small, Q ^ 0.75. It should also be 
noted that the lowest-energy conformations in the main channel D have only slightly 
larger energies than the two native folds. Thus, the folding of this heteropolymer is 
accompanied by a very complex folding characteristics. In fact, this multiple-peak 
distribution near minimum energies is a strong indication for metastability. A native 
fold in the natural sense does not exist, the 2=1 conformation is only a reference 
state but the folding towards this structure is not distinguished as it is in the folding 
characteristics of sequences SI and S2. The amorphous folding behavior is also seen 
in the free-energy landscapes in Fig.[3c). Above the folding transitions {T = 0.2) the 
typical sequence-independent denatured conformations with (2) ~ 0.77 dominate (D). 
Then, in the annealing process, several channels are formed and coexist. The two most 
prominent channels (to which the lowest-energy conformations belong that we found in 
the simulations) eventually lead for T ^ 0.01 to ensembles of macrostates with Q > 0.97 
(Ml), and conformations with Q < 0.75 (M2). The lowest-energy conformation found 
in this regime is structurally different but energetically degenerate, if compared with the 
reference conformation. 



STATISTICAL ANALYSES OF PEPTIDE AGGREGATION 



Pseudophase separation in polymeric nucleation processes 

Beside folding mechanisms, the aggregation of proteins belongs to the biologically 
most relevant molecular structure formation processes. While the specific docking be- 
tween receptors and ligands is not necessarily accompanied by global structural changes, 
protein folding and oligomerization of peptides are typically cooperative conformational 



transitions [13211 . Proteins and their aggregates are comparatively small systems and are 
often formed by only a few peptides. A very prominent example is the extracellular 
aggregation of the A/3 peptide, which is associated with Alzheimer's disease. Follow- 
ing the amyloid hypothesis, it is believed that these aggregates are neurotoxic, i.e., they 
are able to fuse into cell membranes of neurons and open calcium ion channels. It is 
known that extracellular Ca^+ ions intruding into a neuron can promote its degenera- 



tion i33L34,,35i1. 

Conformational transitions proteins experience during structuring and aggregation 
are not phase transitions in the strict thermodynamic sense and their statistical anal- 
ysis is usually based on studies of signals exposed by energetic and structural fluc- 
tuations, as well as system-specific "order" parameters. In these studies, the temper- 
ature T is considered as an adjustable, external control parameter and, for the analy- 
sis of the pseudophase transitions, the peak structure of quantities such as the specific 
heat and the fluctuations of the gyration tensor components or "order" parameter as 
functions of the temperature are investigated. The natural ensemble for this kind of 
analysis is the canonical ensemble, where the possible states of the system with en- 
ergies E are distributed according to the Boltzmann probability exp{—E/kBT), where 
kB is the Boltzmann constant. However, phase separation processes of small systems 
are accompanied by surface effects at the interface between the pseudophases [36, .37ll. 
This is reflected by the behavior of the microcanonical entropy y{E), which ex- 
hibits a convex monotony in the transition region. Consequences are the backbend- 
ing of the caloric temperature T{E) = [dS^ / dE)^^ , i.e., the decrease of temperature 
with increasing system energy, and the negativity of the microcanonical specific heat 
Cv{E) = {dT{E)/dE)-^ = ~{dy / dEf /{d^y/dE^). The physical reason is that the 
free energy balance in phase equilibrium requires the minimization of the interfacial 
surface and, therefore, the loss of entropy fsS, A reduction of the entropy can, 

however, only be achieved by transferring energy into the system. 

It is a surprising fact that this so-called backbending effect is indeed observed in 
transitions with phase separation. Although this phenomenon has already been known 
for a long time from astrophysical systems [41,]. it has been widely ignored since then 
as somehow "exotic" effect. Recently, however, experimental evidence was found from 



melting studies of sodium clusters by photofragmentation [ |42ll . Bimodality and negative 



specific heats are also known from nuclei fragmentation experiments and models [|43 . 



4411 . as well as from spin models on finite lattices which experience first-order transitions 



in the thermodynamic limit 1451 ]40\. This phenomenon is also observed in a large 



number of other isolated finite model systems for evaporation and melting effects ||4j 



The following discussion of the aggregation behavior is based on multicanonical 
computer simulations of a mesoscopic hydrophobic-polar heteropolymer model for 
aggregation based on the AB model 1,39, ,4Q1. 



Mesoscopic hydrophobic-polar aggregation model 

For studies of heteropolymer aggregation on mesoscopic scales, a novel model is 
employed that is based on the hydrophobic-polar single-chain AB model (HI). As for 
modeling heteropolymer folding, we assume here that the tertiary folding process of the 
individual chains is governed by hydrophobic-core formation in an aqueous environ- 
ment. For systems of more than one chain, we further take into account that the interac- 
tion strengths between nonbonded residues are independent of the individual properties 
of the chains the residues belong to. Therefore, we use the same parameter sets as in 
the AB model for the pairwise interactions between residues of different chains. Our 
aggregation model reads [[39l . l40ll 



^ = I4b + L L 4.(r,,,,;(T,„(T,J, (13) 

where /i, v label the M polymers interacting with each other, and z^, jv index the N^^y 
monomers of the respective /ith and vth polymer. The intrinsic single-chain energy of 
the /ith polymer is given by [cf. Eq. @] 

45 = ^I(l-'^ost?,-^)-f £ ^K;,;a.„a,J, (14) 

with < -d-j^ < n denoting the bending angle between monomers z^, z^ + 1, and ijx+l. 
The nonbonded inter-residue pair potential 



(15) 



depends on the distance rj^j^ between the residues, and on their type, c,-^ = A, 5. The 
long-range behavior is attractive for like pairs of residues [C(A,A) = 1, C{B,B) = 0.5] 
and repulsive otherwise [C(A,5) = C(5,A) = —0.5]. The lengths of all virtual peptide 
bonds are set to unity. 

Employing this model, we study in the following thermodynamic properties of the 
aggregation of oligomers with the Fibonacci sequence Fl: AB2AB2ABAB2AB over the 
whole energy and temperature regime. 



Order parameter of aggregation and fluctuations 



In order to distinguish between the fragmented and the aggregated regime, we intro- 
duce the "order" parameter [|39l. l40|l 



^ M 
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(16) 



where the summations are taken over the minimum distances dper — T^per i ^per 5 ^per 
of the respective centers of mass of the chains (or their periodic continuations). The 
center of mass of the jUth chain in a box with periodic boundary conditions is defined 

as rcoM,M = Li^=i [dper (r/M'^i^) +ri^] /N^, where ri^ is the coordinate vector of the 
first monomer and serves as a reference coordinate in a local coordinate system. 

The aggregation parameter is to be considered as a qualitative measure; roughly, 
fragmentation corresponds to large values of F, aggregation requires the centers of 
masses to be close to each other, in which case F is comparatively small. Despite its 
qualitative nature, it turns out to be a surprisingly manifest indicator for the aggregation 
transition and allows even a clear discrimination of different aggregation pathways, as 
will be seen later on. 

According to the Boltzmann distribution, we define canonical expectation values of 
any observable O by 
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where the canonical partition function Zcan is given by 



■Zcan(^) 
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(18) 



Formally, the integrations are performed over all possible conformations of the M 
chains. 

Similarly to the specific heat per monomer cv{T) = d{E) /NiotdT = {{E^) — 
(E)^) / NtotksT^ (with A^tot = Y!^=\Nii) which expresses the thermal fluctua- 
tions of energy, the temperature derivative of (F) per monomer, d{r)/NioidT = 
{{TE) — (r) (E)) / NioiksT^ , is a useful indicator for cooperative behavior of the 
multiple-chain system. Since the system size is small - the number of monomers A^tot 
as well as the number of chains M - aggregation transitions, if any, are expected to 
be signalized by the peak structure of the fluctuating quantities as functions of the 
temperature. This requires the temperature to be a unique external control parameter 
which is a natural choice in the canonical statistical ensemble. Furthermore, this is a 
typically easily adjustable and, therefore, convenient parameter in experiments. How- 
ever, aggregation is a phase separation process and, since the system is small, there is 
no uniform mapping between temperature and energy [|39l. l4Q|] . For this reason, the total 



0.5 
0.4 
^ 0.3 
C 0.2 
0.1 
0.0 




(r)(T)/Aftot 



d{T)lN,,,dT 



(b) 



J- 



20.0 



15.0 



10.0 5^ 



5.0 



0.0 



0.05 0.10 0.15 0.20 0.25 0.30 0.35 

T 

FIGURE 6. Aggregation parameter (F) /Mot and fluctuations d(r) /NtotdT as functions of the tempera- 
ture. 



system energy is the more appropriate external parameter. Thus, the microcanonical 
interpretation will turn out to be the more favorable description, at least in the transition 
region. 



Canonical and microcanonical interpretation 

In our aggregation study of the 2xFl system we obtain from the canonical analy- 
sis a surprisingly clear picture of the aggregation transition. In Fig. [6l the temperature 
dependence of the mean aggregation order parameter (F) and the fluctuations of F are 
shown. The aggregation transition is signalized by a very sharp peak and we read off 
an aggregation temperature close to Tagg ~ 0.20. The aggregation of the two peptides is 
a single-step process, in which the formation of the aggregate with a common compact 
hydrophobic core governs the folding behavior of the individual chains. Under such 
conditions, folding and binding are not separate processes. This is different if the in- 
trinsic polymeric forces are stronger than the binding affinity. In this case the already 
folded molecule simply docks at the active site of a target without changing its global 
conformation. These two principal scenarios are also observed in molecular adsorption 
processes at solid substrates. This provides another example where mesoscopic models 
prove extremely useful in order to reveal the structural phases of adsorbed and desorbed 
conformations in dependence of external parameters such as solvent quality and temper- 
ature [|49l[50ll. 

In the microcanonical analysis of peptide-peptide aggregation, the system energy E 
is kept (almost) fixed and treated as an external control parameter. The system can only 
take macrostates with energies in the interval {E,E + AE) with AE being sufficiently 
small to satisfy AG{E) = g{E)AE, where AG{E) is the phase-space volume of this 
energetic shell. In the limit AE — > 0, the total phase-space volume up to the energy 
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FIGURE?. Microcanonical Hertz entropy ^(/i) ofthe2xFl system, concave Gibbs humfy(£'), and 
inverse caloric temperature T^^{E) as functions of energy. The phase separation regime ranges from iSagg 
to fifrag. Between T^^ and T^^, the temperature is no suitable external control parameter and the canonical 
interpretation is not useful: The inverse caloric temperature T^' (£) exhibits an obvious backbending in 
the transition region. Note the second, less-pronounced backbending in the energy range £< < E < isftag- 



E can thus be expressed as 

G{E)= f dE'g{E'). (19) 

Since g{E) is positive for all E, G{E) is a monotonically increasing function and this 
quantity is suitably related to the microcanonical entropy ^{E) of the system. In the 
definition of Hertz, 

y{E)=kB\nG{E). (20) 

Alternatively, the entropy is often directly related to the density of states g{E) and 
defined as 

S{E)=kB\ng{E). (21) 

The density of states exhibits a decrease much faster than exponential towards the 
low-energy states. For this reason, the phase-space volume at energy E is strongly 
dominated by the number of states in the energy shell /S.E. Thus G{E) ^ AG(£') ~ g{E) 
is directly related to the density of states. This virtual identity breaks down in the 
higher-energy region, where \ng{E) is getting flat - in our case far above the energetic 
regions being relevant for the discussion of the aggregation transition (i.e., for energies 
E ^ ^frag, see Fig.|7]). Actually, both definitions of the en trop y lead to virtually identical 
results in the analysis of the aggregation transition i39l l40ll. The (reciprocal) slope of 
the microcanonical entropy fixes the temperature scale and the corresponding caloric 
temperature is then defined via T{E) = {dy{E) /dE)^^ for fixed volume V and particle 
number A^tot- 

As long as the mapping between the caloric temperature T and the system energy E 
is bijective, the canonical analysis of crossover and phase transitions is suitable since 




the temperature can be treated as external control parameter. For systems, where this 
condition is not satisfied, however, in a standard canonical analysis one may easily 
miss a physical effect accompanying condensation processes: Due to surface effects 
(the formation of the contact surface between the peptides requires a rearrangement 
of monomers in the surfaces of the individual peptides), additional energy does not 
necessarily lead to an increase of temperature of the condensate. Actually, the aggregate 
can even become colder. The supply of additional energy supports the fragmentation 
of parts of the aggregate, but this is overcompensated by cooperative processes of 
the particles aiming to reduce the surface tension. Condensation processes are phase- 
separation processes and as such aggregated and fragmented phases coexist. Since 
in this phase- separation region T and E are not bijective, this phenomenon is called 
the "backbending effect". The probably most important class of systems exhibiting 
this effect is characterized by their smallness and the capability to form aggregates, 
depending on the interaction range. The fact that this effect could be indirectly observed 



in sodium clustering experiments [|42|1 gives rise to the hope that backbending could also 
be observed in aggregation processes of small peptides. 

Since the 2xFl system apparently belongs to this class, the backbending effect is 
also observed in the aggregation/fragmentation transition of this system. This is shown 
in Fig. |71 where the microcanonical entropy S^{E) is plotted as function of the system 
energy. The phase-separation region of aggregated and fragmented conformations lies 
between Eagg ~ —8.85 and Efj-ag ~ 1 .05. Constructing the concave Gibbs hull J^,^{E) by 
linearly connecting 5^{E^g^ and =5^(£'frag) (straight dashed line in Fig. IT]), the entropic 
deviation due to surface effects is simply /S.S^{E) = J^y{E) — S^{E). The deviation is 
maximal for E = E^ep and AS^{Esep) = A=5^surf is the surface entropy. The Gibbs hull 
also defines the aggregation transition temperature 

Tagg = C ' 1 • (22) 



For the 2xFl system, we find Tagg ~ 0.198, which is virtually identical with the peak 
temperature of the aggregation parameter fluctuation (see Fig. (6]). 

The inverse caloric temperature T^^{E) is also plotted into Fig. U\ For a fixed tem- 
perature in the interval < T <T^ (r< ^ 0.169 and r> ^ 0.231), different energetic 
macrostates coexist. This is a consequence of the backbending effect. Within the back- 
bending region, the temperature decreases with increasing system energy. The horizontal 
line at ^ 5.04 is the Maxwell construction, i.e., the slope of the Gibbs hull J^y{E). 
Although the transition seems to have similarities with the van der Waals description 
of the condensation/evaporation transition of gases - the "overheating" of the aggregate 
between Tagg and r> (within the energy interval E^gg <£■<£■> ^ —5.13) is as appar- 
ent as the "undercooling" of the fragments between r< and Tagg (in the energy interval 
£^frag > E > £'< ~ — 1 . 13) - it is important to notice that in contrast to the van der Waals 
picture the backbending effect in-between is a real physical effect. Another essential 
result is that in the transition region the temperature is not a suitable external control pa- 
rameter: The macrostate of the system cannot be adjusted by fixing the temperature. The 
better choice is the system energy which is unfortunately difficult to control in exper- 
iments. Another direct consequence of the energetic ambiguity for a fixed temperature 
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FIGURE 8. Microcanonical specific heat Cv{E) for the 2xFl complex. Note the negativity in the 
backbending regions ll39ll . 



between r< and r> is that the canonical interpretation is not suitable for detecting the 
backbending phenomenon. 

The most remarkable result is the negativity of the specific heat of the system in 
the backbending region, as shown in Fig. [U A negative specific heat in the phase 
separation regime is due to the nonadditivity of the energy of the two subsystems as 
the interaction between the chains is stronger than the attractive inter-chain forces of 
the individual polymers. "Heating" a large aggregate would lead to the stretching of 
monomer-monomer contact distances, i.e., the potential energy of an exemplified pair of 
monomers increases, while kinetic energy and, therefore, temperature remain widely 
constant. In a comparatively small aggregate, additional energy leads to cooperative 
rearrangements of monomers in the aggregate in order to reduce surface tension, i.e, the 
formation of molten globular aggregates is suppressed. In consequence, kinetic energy 
is transferred into potential energy and the temperature decreases. In this regime, the 
aggregate becomes colder, although the total energy increases [|39ll . 

The precise microcanonical analysis reveals also a further detail of the aggregation 
transition. Close to Ep^^ ~ —0.32, the curve in Fig. [7] exhibits another "backbend- 
ing" which signalizes a second, but unstable transition of the same type. The associated 
transition temperature Tpj-e ~ 0.18 is smaller than Tagg, but this transition occurs in the 
energetic region where fragmented states dominate. Thus this transition can be inter- 
preted as the premelting of aggregates by forming intermediate states. These interme- 
diate structures are rather weakly stable: The population of the premolten aggregates 
never dominates. In particular, at Tpre, where premolten aggregates and fragments co- 
exist, the population of compact aggregates is much larger. This can nicely be seen in 
the canonical energy histograms at these temperatures plotted in Fig.[9l where the sec- 
ond backbending is only signalized by a small cusp in the coexistence region. Since 
both transitions are phase- separation processes, structure formation is accompanied by 
releasing latent heat which can be defined as the energetic widths of the phase co- 




FIGURE 9. Logarithmic plots of the canonical energy histograms (not normalized) at T « 0.18 and 
T K, 0.20, respectively. 



existence regimes, i.e., AQagg = -Efrag — -Eagg = Taggi-^^l-Efrag) — ^{Eagg)] ~ 9.90 and 
AQpre = £frag " ^pre = T^pre [-^^(^frag) " ^(^pre)] -1-37. Obviously, the energy required 
to melt the premolten aggregate is much smaller than to dissolve a compact (solid) ag- 
gregate. 

For the comparison of the surface entropies, we use the definition (|2TI) of the entropy. 
In the case of the aggregation transition, the surface entropy is A^^^^| ^ '^'^surf ~ 
HsiEfiep) —S{Efiep), where Hs{E) ^ J^y{E) is the concave Gibbs hull of S{E). Since 
Hs{Esep) = Hs{Efrag) - (£frag -£sep)/7'agg and Hs{Ef,ag) = 5(£frag), the surface entropy 
is ^ 

^■^surf ~ "^(^frag) '^(-E'sep) Tf, (^frag -Esep)- (23) 

^agg 

Yet utilizing that the canonical distribution hcaniE) = J dTHcan{E,T\ T) at Tagg (shown 
in Fig.|9l) is hcan{E) ~ g{E) exp{—E / ksTagg) , the surface entropy can be written in the 
simple and computationally convenient form ll45ll : 

AS'Z-kBln'f^. (24) 

''can 1,-C'sep ) 

A similar expression is valid for the coexistence of premolten and fragmented states at 
Tpre The corresponding canonical distribution is also shown in Fig. HI Thus, we obtain (in 
units of ks) for the surface entropy of the aggregation transition AS^^f^ ~ 2.48 and for the 
premelting AS^^^^ ~ 0.04, confirming the weakness of the interface between premolten 
aggregates and fragmented states. 
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FIGURE 10. Microcanonical entropies per monomer s{e), respective Gibbs constructions hs{e) (left- 
hand scale), and deviations As{e) = hs{e) —s{e) (right-hand scale) for 2xFl (labeled as 2), 3 xFl (3), and 
4xFl (4) as functions of the energy per monomer e. 

Aggregation transition in larger heteropolymer systems 

The statements in the previous section for the 2xFl system are also, in general, valid 
for larger systems. This is the result of computer simulations for systems consisting of 
three (in the following referred to as 3xFl) and four (4xFl) identical peptides with se- 
quence Fl. As it has already been discussed for the 2xFl system, there are also for the 
larger systems no obvious signals for separate aggregation and hydrophobic-core forma- 
tion processes. Only weak activity in the energy fluctuations in the temperature region 
below the aggregation transition temperature indicates that local restructuring processes 
of little cooperativity (comparable with the discussion of the premolten aggregates in 
the discussion of the 2xFl system) are still happening. The strength of the aggregation 
transition is also documented by the fact that the peak temperatures of energetic and 
aggregation parameter fluctuations are virtually identical for the multi-peptide systems, 
i.e., the aggregation temperature is T^gg ~ 0.2. 

In Fig. \Wi the microcanonical entropies per monomer s{e) = S^{e)/N[o\. (shifted 
by an unimportant constant for clearer visibility) and the corresponding Gibbs hulls 
hs{e) = ^(e)/Mot are shown for 2xFl (in the figure denoted by "2"), 3xFl ("3"), 
and 4xFl ("4"), respectively, as functions of the energy per monomer e = E/Ntot. 
Although the convex entropic "intruder" is apparent for larger systems as well, its 
relative strength decreases with increasing number of chains. The slopes of the respective 
Gibbs constructions determine the aggregation temperature (|22)) which are found to be 
1 ^ 0.212 and T^J"/^ ^ 0.217. 

agg agg 

The existence of the interfacial boundary entails a transition barrier whose strength is 
characterized by the surface entropy A^s^f . In Fig.[10l the individual entropic deviations 
per monomer, As{e) = Ay{e) /Mot are also shown and the maximum deviations, i.e., the 
surface entropies A^suif and relative surface entropies per monomer Ai'surf = -^^-^surf /Mot 
are listed in Table |2l There is no apparent difference between the values of A^surf that 



TABLE 2. Aggregation temperatures Tagg, surface entropies A^surf, 
relative surface entropies per monomer Aismf, relative aggregation and 
fragmentation energies per monomer, Cagg and efrag, respectively, la- 
tent heat per monomer Aq, and phase-separation entropy per monomer 
Aq/Tngg. All quantities for systems consisting of two, three, and four 13- 
mers with AB sequence Fl. 



system 






Aisurf 


^agg 


^frag 


Aq 


Aq/T^gg 


2xFl 


0.198 


2.48 


0.10 


-0.34 


0.04 


0.38 


1.92 


3xFl 


0.212 


2.60 


0.07 


-0.40 


0.05 


0.45 


2.12 


4xFl 


0.217 


2.30 


0.04 


-0.43 


0.05 


0.48 


2.21 



would indicate a trend for a vanishing of the absolute surface barrier in larger systems. 
However, the relative surface entropy Aj'surf obviously decreases. Whether or not it 
vanishes in the thermodynamic limit cannot be decided from our present results and 
is a study worth in its own right. 

It is also interesting that subleading effects increase and the double-well form found 
for 2xFl changes by higher-order effects, and it seems that for larger systems the almost 
single-step aggregation of 2xFl is replaced by a multiple-step process. 

Not surprisingly, the fragmented phase is hardly influenced by side effects and the 
rightmost minimum in Fig. [10] lies well at gfrag = £'frag/Mot ~ 0.04 — 0.05. Since the 
Gibbs construction covers the whole convex region of s{e), the aggregation energy 
per monomer eagg = £^agg/A^tot corresponds to the leftmost minimum and its value 
changes noticeably with the number of chains. In consequence, the latent heat per 
monomer lAq = AQ/Niot = T'aggf^l^^frag) — =^(£^agg)]/Mot that is required to fragment 
the aggregate increases from two to four chains in the system (see Table O. Although 
the systems under consideration are too small to extrapolate phase transition properties 
in the thermodynamic limit, it is obvious that the aggregation-fragmentation transition 
exhibits strong similarities to condensation-evaporation transitions of colloidal systems. 
Given that, the entropic transition barrier A^/Tagg, which we see increasing with the 
number of chains (cf. the values in Table [2l), would survive in the thermodynamic limit 
and the transition was first-order-like. More surprising would be, however, if the convex 
intruder would not disappear, i.e., if the absolute and relative surface entropies A=5^surf 
and A^surf do not vanish. This is definitely a question of fundamental interest as the 
common claim is that pure surface effects typically exhibited only by "small" systems 
are irrelevant in the thermodynamic limit. This requires studies of much larger systems. 

It should clearly be noted, however, that protein aggregates forming themselves in bi- 
ological systems often consist only of a few peptides and are definitely of small size and 
the surface effects are responsible for structure formation and are not unimportant side 
effects. One should keep in mind that standard thermodynamics and the thermodynamic 
limit are somewhat theoretical constructs valid only for very large systems. The increas- 
ing interest in physical properties of small systems, in particular in conformational tran- 
sitions in molecular systems, requires in part a revision of dogmatic thermodynamic 
views. Indeed, by means of today's chemo-analytical and experimental equipment, ef- 
fects like those described in this chapter, should actually experimentally be verifiable as 
these are real physical effects. For studies of the condensation of atoms, where a similar 



behavior occurs, such experiments have actually already been performed [|42ll . 



SUMMARY 

The analyses in the previous sections have shown that it is indeed possible to reveal char- 
acteristic features of structure formation processes of polymers, in particular proteins, by 
means of minimalistic coarse-grained models. This is essential, as a generalized view of 
conformational transitions occurring in folding and aggregation processes of molecular 
systems can only possess a solid basis, if a classification of generic features common to 
different systems enables the introduction of suitable models on mesoscopic scales. 

Depending on the heteropolymer sequence, typically two general transitions occur 
in heteropolymer folding processes. One is the folding transition from random coils to 
compact globular conformations common to all heteropolymers (i.e., little sequence- 
specific), a finite-length analog to the collapse (or 0) transition known from homopoly- 
mers. The stability of the globular or intermediary (pseudo)phase of heteropolymers de- 
pends, however, strongly on the heteropolymer sequence. The second general transition 
at lower temperature (or worse solvent quality) is sort of a glassy transition as it results 
in the formation of the native conformation(s) with small entropy. During this transition, 
the highly compact hydrophobic core is formed, surrounded by a shell of polar residues 
which screens the core from the solvent. The kinetics of this transition strongly depends 
on the heteropolymer sequence. Hydrophobic-core formation is typically a ("first-order- 
like") phase separation process and the sharpness and height of the free-energy barrier 
separating the hydrophobic-core and globular (pseudo)phases are measures for the sta- 
bility of the hydrophobic core. It is assumed that a large set of the comparatively few 
functional bioproteins in nature exhibits such a large barrier preventing unfolding into 
nonfunctional conformations. This is also one of the common arguments, why under 
physiological conditions only a very small number among the possible protein sequences 
can be functional at all. 

Although the mesoscopic models are still extremely minimalistic, we have found 
quite surprising characteristic folding features comparable to those of real proteins. 
Analyzing transition channels and free-energy landscapes based on a suitably defined 
similarity or "order" parameter, we identified folding behaviors which are known from 
real proteins in a like manner: two- state folding with a single kinetic barrier and unique 
native state, folding towards the native fold through intermediates over different barriers, 
and metastability with different, almost degenerate, native states. 

We have also discussed in detail thermodynamic properties of peptide aggregation 
processes. We compared small systems of different numbers of short peptides and inves- 
tigated finite-size properties in the canonical and in the microcanonical ensemble. Each 
of these analyses has advantages. Applying the canonical formalism reveals strong fluc- 
tuations in the vicinity of the aggregation transition which allow for a precise estimation 
of the aggregation transition temperature for a finite system, but also for a finite-size 
scaling analysis toward the system with infinitely many chains. 

But, analyzing the aggregation of a few peptides from the microcanonical perspective 
uncovers an underlying physical effect, the backbending effect, which is largely "aver- 
aged out" in the canonical analysis. "Backbending" means that in the transition region 



the caloric temperature decreases with increasing energy. This is due to surface effects, 
additional energy does not lead to an increase of the caloric temperature; rather it is 
used to rearrange monomers in order to reduce surface tension at the expense of entropy. 
In effect, the protein complex is getting colder. For an increasing number of peptides 
in the system, we could show, however, that the effect becomes less relevant, although 
the latent heat increases and thus the first-order character of this phase- separation pro- 
cess is getting stronger. Nonetheless, in biological aggregation processes typically only 
a few proteins are involved and thus the effect should be apparent. The "physical real- 
ity" of this effect has already been confirmed in atomic cluster formation experiments. 
However, the experimental verification in polymeric systems is still pending. 

One of the essential questions in aggregation processes among polymers is, how the 
mutual influence induces conformational changes. Two potential scenarios leading to the 
formation of complexes are conceivable. If the external force is attractive, but weaker 
than the intrinsic, intermonomeric forces that form the polymer or protein conformation, 
the proximity of an attractive polymer or substrate is not sufficient to refold the polymer 
and the aggregation is a simple docking process. Unless the match is perfect, the binding 
force that holds the compound together is rather weak. On the other hand, if the external 
force entails refolding of the polymer, it can better adapt to the target structure (e.g., 
a crystalline substrate), or, if both partners experience conformational changes, a new, 
highly compact compound can form. In this so-called coupled folding-binding process, 
the binding force is typically stronger than in the docking case. In our aggregation study 
of a few short peptides, we observed such a behavior. 

Our results were mainly obtained by means of sophisticated generalized-ensemble 
chain-growth and Markov chain Monte Carlo computer simulations, partly newly devel- 
oped or generalized for these purposes. It is a non-negligible fact that even with today's 
equipment computer simulations of polymers, in particular, proteins, are extremely de- 
manding and efficient algorithms are required. Despite the enormous progress in protein 
research in the past few years, it will remain one of the biggest scientific future chal- 
lenges to uncover the principal secrets of cooperative conformational activity in structure 
formation processes of proteins. 
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